Analysis of the Temporal Behaviour of Search Engine Crawlers at Web Sites

نویسندگان

  • Jeeva Jose
  • P. Sojan Lal
چکیده

Web log mining is the extraction of web logs to analyze user behaviour at web sites. In addition to user information, web logs provide immense information about search engine traffic and behaviour. Search engine crawlers are highly automated programs that periodically visit the web site to collect information. The behaviour of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. The time spent by various crawlers is significant in identifying the server load as major proportion of the server load is constituted by search engine crawlers. A temporal analysis of the search engine crawlers were done to identify their behaviour. It was found that there is a significant difference in the total time spent by various crawlers. The presence of search engine crawlers at web sites on hourly basis was also done to identify the dynamics of search engine crawlers at web sites.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Web Logs to Identify Search Engine Behaviour at Websites

Web Usage Mining also known as Web Log Mining is the extraction of user behaviour from web log data. The log files also provide immense information about the search engine traffic at a website. This search engine traffic is helpful to analyse the ethics of search engines, quality of the crawlers, periodicity of the visits and also the server load. Search engine crawlers are automated programs w...

متن کامل

Differences in Time Delay between Search Engine Crawlers at Web Sites

Web log mining provides tremendous information about user traffic and search engine behavior at web sites. The behavior of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. Search engine crawlers are highly automated programs which are seldom regulated manually. These crawlers periodically visit th...

متن کامل

Identify Temporal Websites Based on User Behavior Analysis

The web is growing at a rapid speed and it is almost impossible for a web crawler to download all new pages. Pages reporting breaking news should be stored into search engine index as soon as they are published, while others whose content is not time-related can be left for later crawls. We collected and analyzed into users’ page-view data of 75,112,357 pages for 60 days. Using this data, we fo...

متن کامل

Workload-Aware Web Crawling and Server Workload Detection

With the development of search engines, more and more web crawlers are used to gather web pages. The rising crawling traffic has brought the concern that crawlers may impact web sites. On the other hand, more efficient crawling strategy is required for the coverage and freshness of search engine index. In this paper, crawlers of several major search engines are analyzed using one six-months acc...

متن کامل

An investigation of web crawler behavior: characterization and metrics

In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013